feat(jdbc): extend OpenTelemetry instrumentation for metadata and pagination#12918
feat(jdbc): extend OpenTelemetry instrumentation for metadata and pagination#12918keshavdandeva wants to merge 6 commits intojdbc/feature-branch-otelfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request integrates OpenTelemetry tracing into the BigQuery JDBC driver, focusing on ResultSet iteration and DatabaseMetaData operations. Key changes include capturing span contexts during result set initialization and propagating them during next() calls, as well as adding tracing to metadata methods like getTables, getColumns, and getSchemas. Review feedback identifies significant performance overhead from wrapping high-frequency next() methods in telemetry scopes and points out that spans for asynchronous metadata operations end prematurely, leading to inaccurate duration metrics and fragmented traces.
…er the blocking operation `buffer.take()`
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request integrates OpenTelemetry tracing across the BigQuery JDBC driver, specifically targeting metadata operations and result set processing. Key changes include wrapping blocking operations in appropriate tracing scopes, propagating span contexts to background threads, and standardizing tracer acquisition via a new utility method. Feedback focuses on refining the tracing implementation, such as correctly handling background span parents to avoid timeline anomalies, ensuring span statuses are updated on errors, and replacing redundant null checks on SpanContext with validity checks.
| // Advance the cursor. Potentially blocking operation. | ||
| BigQueryArrowBatchWrapper batchWrapper = this.buffer.take(); | ||
| BigQueryArrowBatchWrapper batchWrapper; | ||
| try (Scope scope = Context.current().with(Span.wrap(originalSpanContext)).makeCurrent()) { |
There was a problem hiding this comment.
Why not including entire block in the span? Line 261 performs deserialization, so it's still part of the operation
There was a problem hiding this comment.
Yeah so, earlier I did that but then Gemini review said:
The next() method is a high-frequency operation in JDBC, often called millions of times during result set iteration. Wrapping the entire method body in an OpenTelemetry Scope introduces significant overhead due to ThreadLocal access and object allocations for every single row. This is particularly inefficient when next() is simply incrementing an index for rows already available in memory (e.g., when isNested is true or when iterating within the current Arrow batch). Consider moving the context propagation to only the specific sections where blocking operations or external API calls occur, such as the block starting at line 243 where buffer.take() is called. Ensure that the scope is managed in an exception-safe manner (e.g., using try-with-resources) to prevent resource leaks.
But I guess, for Arrow, as this is batch of rows, maybe its fine
| try { | ||
| // Advance the cursor,Potentially blocking operation | ||
| this.cursor = this.buffer.take(); | ||
| try (Scope scope = Context.current().with(Span.wrap(originalSpanContext)).makeCurrent()) { |
There was a problem hiding this comment.
Same comment from Gemini about this as well:
Similar to the Arrow implementation, wrapping the entire next() method in a Scope creates substantial performance overhead for every row processed. Since next() is a hot path, the cost of managing the OpenTelemetry context for every iteration can lead to a noticeable regression in throughput for large result sets. It is recommended to limit the scope of context propagation to the parts of the method that actually perform work requiring context, such as the blocking buffer.take() call. Ensure the scope is closed in an exception-safe manner to prevent leaks.
| .addLink(parentSpanContext) | ||
| .startSpan(); | ||
|
|
||
| try (Scope scope = backgroundSpan.makeCurrent()) { |
There was a problem hiding this comment.
Can we move startSpan() there too? So no need for manual cleanup at the end
There was a problem hiding this comment.
I looked into this, and unfortunately, the OTel does not make the Span interface AutoCloseable. Only the Scope returned by span.makeCurrent() is AutoCloseable to handle thread-local cleanup. That is why we are forced to manually call span.end() in the finally block
b/491245568
Key Changes
Core Instrumentation Logic
BigQueryDatabaseMetaData.java(getCatalogs,getSchemas,getTables,getColumns) to capture underlying API calls.fetchNextPagesinBigQueryStatement.javaand linked background pagination spans back to it, avoiding timeline anomalies.SpanContextinBigQueryBaseResultSet.javaat creation time and made it current duringnext()inBigQueryJsonResultSet.javaandBigQueryArrowResultSet.javato survive thread hops.getSafeTracertoBigQueryJdbcOpenTelemetry.javaas a static utility to ensure consistent fallback behavior across the driver.populateArrowBufferedQueueinBigQueryStatement.javato its own private methodprocessArrowStreamto improve readability and maintainability.